22 research outputs found
SafeDiffuser: Safe Planning with Diffusion Probabilistic Models
Diffusion model-based approaches have shown promise in data-driven planning,
but there are no safety guarantees, thus making it hard to be applied for
safety-critical applications. To address these challenges, we propose a new
method, called SafeDiffuser, to ensure diffusion probabilistic models satisfy
specifications by using a class of control barrier functions. The key idea of
our approach is to embed the proposed finite-time diffusion invariance into the
denoising diffusion procedure, which enables trustworthy diffusion data
generation. Moreover, we demonstrate that our finite-time diffusion invariance
method through generative models not only maintains generalization performance
but also creates robustness in safe data generation. We test our method on a
series of safe planning tasks, including maze path generation, legged robot
locomotion, and 3D space manipulation, with results showing the advantages of
robustness and guarantees over vanilla diffusion models.Comment: 19 pages, website: https://safediffuser.github.io/safediffuser
Interpreting Neural Policies with Disentangled Tree Representations
The advancement of robots, particularly those functioning in complex
human-centric environments, relies on control solutions that are driven by
machine learning. Understanding how learning-based controllers make decisions
is crucial since robots are often safety-critical systems. This urges a formal
and quantitative understanding of the explanatory factors in the
interpretability of robot learning. In this paper, we aim to study
interpretability of compact neural policies through the lens of disentangled
representation. We leverage decision trees to obtain factors of variation [1]
for disentanglement in robot learning; these encapsulate skills, behaviors, or
strategies toward solving tasks. To assess how well networks uncover the
underlying task dynamics, we introduce interpretability metrics that measure
disentanglement of learned neural dynamics from a concentration of decisions,
mutual information and modularity perspective. We showcase the effectiveness of
the connection between interpretability and disentanglement consistently across
extensive experimental analysis
On the Forward Invariance of Neural ODEs
To ensure robust and trustworthy decision-making, it is highly desirable to
enforce constraints over a neural network's parameters and its inputs
automatically by back-propagating output specifications. This way, we can
guarantee that the network makes reliable decisions under perturbations. Here,
we propose a new method for achieving a class of specification guarantees for
neural Ordinary Differentiable Equations (ODEs) by using invariance set
propagation. An invariance of a neural ODE is defined as an output
specification, such as to satisfy mathematical formulae, physical laws, and
system safety. We use control barrier functions to specify the invariance of a
neural ODE on the output layer and propagate it back to the input layer.
Through the invariance backpropagation, we map output specifications onto
constraints on the neural ODE parameters or its input. The satisfaction of the
corresponding constraints implies the satisfaction of output specifications.
This allows us to achieve output specification guarantees by changing the input
or parameters while maximally preserving the model performance. We demonstrate
the invariance propagation on a comprehensive series of representation learning
tasks, including spiral curve regression, autoregressive modeling of joint
physical dynamics, convexity portrait of a function, and safe neural control of
collision avoidance for autonomous vehicles.Comment: 20 page
Towards Generalist Robots: A Promising Paradigm via Generative Simulation
This document serves as a position paper that outlines the authors' vision
for a potential pathway towards generalist robots. The purpose of this document
is to share the excitement of the authors with the community and highlight a
promising research direction in robotics and AI. The authors believe the
proposed paradigm is a feasible path towards accomplishing the long-standing
goal of robotics research: deploying robots, or embodied AI agents more
broadly, in various non-factory real-world settings to perform diverse tasks.
This document presents a specific idea for mining knowledge in the latest
large-scale foundation models for robotics research. Instead of directly using
or adapting these models to produce low-level policies and actions, it
advocates for a fully automated generative pipeline (termed as generative
simulation), which uses these models to generate diversified tasks, scenes and
training supervisions at scale, thereby scaling up low-level skill learning and
ultimately leading to a foundation model for robotics that empowers generalist
robots. The authors are actively pursuing this direction, but in the meantime,
they recognize that the ambitious goal of building generalist robots with
large-scale policy training demands significant resources such as computing
power and hardware, and research groups in academia alone may face severe
resource constraints in implementing the entire vision. Therefore, the authors
believe sharing their thoughts at this early stage could foster discussions,
attract interest towards the proposed pathway and related topics from industry
groups, and potentially spur significant technical advancements in the field
RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation
We present RoboGen, a generative robotic agent that automatically learns
diverse robotic skills at scale via generative simulation. RoboGen leverages
the latest advancements in foundation and generative models. Instead of
directly using or adapting these models to produce policies or low-level
actions, we advocate for a generative scheme, which uses these models to
automatically generate diversified tasks, scenes, and training supervisions,
thereby scaling up robotic skill learning with minimal human supervision. Our
approach equips a robotic agent with a self-guided propose-generate-learn
cycle: the agent first proposes interesting tasks and skills to develop, and
then generates corresponding simulation environments by populating pertinent
objects and assets with proper spatial configurations. Afterwards, the agent
decomposes the proposed high-level task into sub-tasks, selects the optimal
learning approach (reinforcement learning, motion planning, or trajectory
optimization), generates required training supervision, and then learns
policies to acquire the proposed skill. Our work attempts to extract the
extensive and versatile knowledge embedded in large-scale models and transfer
them to the field of robotics. Our fully generative pipeline can be queried
repeatedly, producing an endless stream of skill demonstrations associated with
diverse tasks and environments
Are All Vision Models Created Equal? A Study of the Open-Loop to Closed-Loop Causality Gap
There is an ever-growing zoo of modern neural network models that can
efficiently learn end-to-end control from visual observations. These advanced
deep models, ranging from convolutional to patch-based networks, have been
extensively tested on offline image classification and regression tasks. In
this paper, we study these vision architectures with respect to the open-loop
to closed-loop causality gap, i.e., offline training followed by an online
closed-loop deployment. This causality gap typically emerges in robotics
applications such as autonomous driving, where a network is trained to imitate
the control commands of a human. In this setting, two situations arise: 1)
Closed-loop testing in-distribution, where the test environment shares
properties with those of offline training data. 2) Closed-loop testing under
distribution shifts and out-of-distribution. Contrary to recently reported
results, we show that under proper training guidelines, all vision models
perform indistinguishably well on in-distribution deployment, resolving the
causality gap. In situation 2, We observe that the causality gap disrupts
performance regardless of the choice of the model architecture. Our results
imply that the causality gap can be solved in situation one with our proposed
training guideline with any modern network architecture, whereas achieving
out-of-distribution generalization (situation two) requires further
investigations, for instance, on data diversity rather than the model
architecture
Drive Anywhere: Generalizable End-to-end Autonomous Driving with Multi-modal Foundation Models
As autonomous driving technology matures, end-to-end methodologies have
emerged as a leading strategy, promising seamless integration from perception
to control via deep learning. However, existing systems grapple with challenges
such as unexpected open set environments and the complexity of black-box
models. At the same time, the evolution of deep learning introduces larger,
multimodal foundational models, offering multi-modal visual and textual
understanding. In this paper, we harness these multimodal foundation models to
enhance the robustness and adaptability of autonomous driving systems, enabling
out-of-distribution, end-to-end, multimodal, and more explainable autonomy.
Specifically, we present an approach to apply end-to-end open-set (any
environment/scene) autonomous driving that is capable of providing driving
decisions from representations queryable by image and text. To do so, we
introduce a method to extract nuanced spatial (pixel/patch-aligned) features
from transformers to enable the encapsulation of both spatial and semantic
features. Our approach (i) demonstrates unparalleled results in diverse tests
while achieving significantly greater robustness in out-of-distribution
situations, and (ii) allows the incorporation of latent space simulation (via
text) for improved training (data augmentation via text) and policy debugging.
We encourage the reader to check our explainer video at
https://www.youtube.com/watch?v=4n-DJf8vXxo&feature=youtu.be and to view the
code and demos on our project webpage at https://drive-anywhere.github.io/.Comment: Project webpage: https://drive-anywhere.github.io Explainer video:
https://www.youtube.com/watch?v=4n-DJf8vXxo&feature=youtu.b